AITopics | activation matrix

Collaborating Authors

activation matrix

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Randomized Matrix Sketching for Neural Network Training and Gradient Monitoring

Antil, Harbir, Verma, Deepanshu

arXiv.org Artificial IntelligenceOct-2-2025

Neural network training relies on gradient computation through backpropagation, yet memory requirements for storing layer activations present significant scalability challenges. We present the first adaptation of control-theoretic matrix sketching to neural network layer activations, enabling memory-efficient gradient reconstruction in backpropagation. This work builds on recent matrix sketching frameworks for dynamic optimization problems, where similar state trajectory storage challenges motivate sketching techniques. Our approach sketches layer activations using three complementary sketch matrices maintained through exponential moving averages (EMA) with adaptive rank adjustment, automatically balancing memory efficiency against approximation quality. Empirical evaluation on MNIST, CIFAR-10, and physics-informed neural networks demonstrates a controllable accuracy-memory tradeoff. We demonstrate a gradient monitoring application on MNIST showing how sketched activations enable real-time gradient norm tracking with minimal memory overhead. These results establish that sketched activation storage provides a viable path toward memory-efficient neural network training and analysis.

artificial intelligence, machine learning, matrix, (17 more...)

arXiv.org Artificial Intelligence

2510.00442

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Review for NeurIPS paper: A Statistical Framework for Low-bitwidth Training of Deep Neural Networks

Neural Information Processing SystemsJan-21-2025, 09:32:17 GMT

Summary and Contributions: The authors analyze the effect of gradient quantization for quantized training in a principled fashion, and introduce two methods that reduce the variance of the gradients when doing quantized training. Still I hold that if FQT is compared to QAT, you should quantize the weights and not keep shadow weights. This is what I meant with having the actual weights quantized, and the updates quantized as well. In most FQT applications that are parallelized in compute, you are very often memory movement bound, meaning you're playing a game of reducing memory as much as possible. The gradients are calculated on the fly, used and discarded in the backward pass, the memory overhead of them is small.

activation matrix, low-bitwidth training, statistical framework, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)

Add feedback

Understanding and Analyzing Model Robustness and Knowledge-Transfer in Multilingual Neural Machine Translation using TX-Ray

Saxena, Vageesh, Loáiciga, Sharid, Rethmeier, Nils

arXiv.org Artificial IntelligenceDec-18-2024

Neural networks have demonstrated significant advancements in Neural Machine Translation (NMT) compared to conventional phrase-based approaches. However, Multilingual Neural Machine Translation (MNMT) in extremely low-resource settings remains underexplored. This research investigates how knowledge transfer across languages can enhance MNMT in such scenarios. Using the Tatoeba translation challenge dataset from Helsinki NLP, we perform English-German, English-French, and English-Spanish translations, leveraging minimal parallel data to establish cross-lingual mappings. Unlike conventional methods relying on extensive pre-training for specific language pairs, we pre-train our model on English-English translations, setting English as the source language for all tasks. The model is fine-tuned on target language pairs using joint multi-task and sequential transfer learning strategies. Our work addresses three key questions: (1) How can knowledge transfer across languages improve MNMT in extremely low-resource scenarios? (2) How does pruning neuron knowledge affect model generalization, robustness, and catastrophic forgetting? (3) How can TX-Ray interpret and quantify knowledge transfer in trained models? Evaluation using BLEU-4 scores demonstrates that sequential transfer learning outperforms baselines on a 40k parallel sentence corpus, showcasing its efficacy. However, pruning neuron knowledge degrades performance, increases catastrophic forgetting, and fails to improve robustness or generalization. Our findings provide valuable insights into the potential and limitations of knowledge transfer and pruning in MNMT for extremely low-resource settings.

machine learning, natural language, translation, (18 more...)

arXiv.org Artificial Intelligence

2412.13881

Country:

Europe > Finland > Uusimaa > Helsinki (0.24)
Europe > Germany > Brandenburg > Potsdam (0.04)
North America > Cuba > La Habana Province > Havana (0.04)
(8 more...)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

On GNN explanability with activation rules

Veyrin-Forrer, Luca, Kamal, Ataollah, Duffner, Stefan, Plantevit, Marc, Robardet, Céline

arXiv.org Artificial IntelligenceJun-17-2024

GNNs are powerful models based on node representation learning that perform particularly well in many machine learning problems related to graphs. The major obstacle to the deployment of GNNs is mostly a problem of societal acceptability and trustworthiness, properties which require making explicit the internal functioning of such models. Here, we propose to mine activation rules in the hidden layers to understand how the GNNs perceive the world. The problem is not to discover activation rules that are individually highly discriminating for an output of the model. Instead, the challenge is to provide a small set of rules that cover all input graphs. To this end, we introduce the subjective activation pattern domain. We define an effective and principled algorithm to enumerate activations rules in each hidden layer. The proposed approach for quantifying the interest of these rules is rooted in information theory and is able to account for background knowledge on the input graph data. The activation rules can then be redescribed thanks to pattern languages involving interpretable features. We show that the activation rules provide insights on the characteristics used by the GNN to classify the graphs. Especially, this allows to identify the hidden features built by the GNN through its different layers. Also, these rules can subsequently be used for explaining GNN decisions. Experiments on both synthetic and real-life datasets show highly competitive performance, with up to 200% improvement in fidelity on explaining graph classification over the SOTA methods.

activation rule, graph, inside-gnn, (15 more...)

arXiv.org Artificial Intelligence

2406.11594

Country:

Europe > France (0.04)
North America > United States > Michigan > Wayne County > Detroit (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Time Elastic Neural Networks

Marteau, Pierre-François

arXiv.org Artificial IntelligenceJun-13-2024

We introduce and detail an atypical neural network architecture, called time elastic neural network (teNN), for multivariate time series classification. The novelty compared to classical neural network architecture is that it explicitly incorporates time warping ability, as well as a new way of considering attention. In addition, this architecture is capable of learning a dropout strategy, thus optimizing its own architecture.Behind the design of this architecture, our overall objective is threefold: firstly, we are aiming at improving the accuracy of instance based classification approaches that shows quite good performances as far as enough training data is available. Secondly we seek to reduce the computational complexity inherent to these methods to improve their scalability. Ideally, we seek to find an acceptable balance between these first two criteria. And finally, we seek to enhance the explainability of the decision provided by this kind of neural architecture.The experiment demonstrates that the stochastic gradient descent implemented to train a teNN is quite effective. To the extent that the selection of some critical meta-parameters is correct, convergence is generally smooth and fast.While maintaining good accuracy, we get a drastic gain in scalability by first reducing the required number of reference time series, i.e. the number of teNN cells required. Secondly, we demonstrate that, during the training process, the teNN succeeds in reducing the number of neurons required within each cell. Finally, we show that the analysis of the activation and attention matrices as well as the reference time series after training provides relevant information to interpret and explain the classification results.The comparative study that we have carried out and which concerns around thirty diverse and multivariate datasets shows that the teNN obtains results comparable to those of the state of the art, in particular similar to those of a network mixing LSTM and CNN architectures for example.

architecture, matrix, time sery, (16 more...)

arXiv.org Artificial Intelligence

2405.17516

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Convergence of Deep ReLU Networks

Xu, Yuesheng, Zhang, Haizhang

arXiv.org Artificial IntelligenceJan-10-2023

We explore convergence of deep neural networks with the popular ReLU activation function, as the depth of the networks tends to infinity. To this end, we introduce the notion of activation domains and activation matrices of a ReLU network. By replacing applications of the ReLU activation function by multiplications with activation matrices on activation domains, we obtain an explicit expression of the ReLU network. We then identify the convergence of the ReLU networks as convergence of a class of infinite products of matrices. Sufficient and necessary conditions for convergence of these infinite products of matrices are studied. As a result, we establish necessary conditions for ReLU networks to converge that the sequence of weight matrices converges to the identity matrix and the sequence of the bias vectors converges to zero as the depth of ReLU networks increases to infinity. Moreover, we obtain sufficient conditions in terms of the weight matrices and bias vectors at hidden layers for pointwise convergence of deep ReLU networks. These results provide mathematical insights to the design strategy of the well-known deep residual networks in image classification.

artificial intelligence, convergence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2107.1253

Country:

North America > United States > New York (0.04)
North America > Canada (0.04)
Asia > China > Guangdong Province > Zhuhai (0.04)
(2 more...)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Detecting Memorization in ReLU Networks

Collins, Edo, Bigdeli, Siavash Arjomand, Süsstrunk, Sabine

arXiv.org Machine LearningOct-8-2018

We propose a new notion of'non-linearity' of a network layer with respect to an input batch that is based on its proximity to a linear system, which is reflected in the nonnegative rank of the activation matrix. Considering batches of similar samples, we find that high non-linearity in deep layers is indicative of memorization. Furthermore, by applying our approach layer-by-layer, we find that the mechanism for memorization consists of distinct phases. We perform experiments on fully-connected and convolutional neural networks trained on several image and audio datasets. Our results demonstrate that as an indicator for memorization, our technique can be used to perform early stopping. A fundamental challenge in machine learning is balancing the bias-variance tradeoff, where overly simple learning models underfit the data (suboptimal performance on the training data) and overly complex models are expected to overfit or memorize the data (perfect training set performance, but suboptimal test set performance). The latter direction of this tradeoff has come into question with the observation that deep neural networks do not memorize their training data despite having sufficient capacity to do so (Zhang et al., 2016), the explanation of which is a matter of much interest. Due to their convenient gradient properties and excellent performance in practice, rectified-linear units (ReLU) have been widely adopted and are now ubiquitous in the field of deep learning.

artificial intelligence, machine learning, memorization, (15 more...)

arXiv.org Machine Learning

1810.03372

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)

Add feedback

Low Rank Structure of Learned Representations

Sanyal, Amartya, Kanade, Varun, Torr, Philip H. S.

arXiv.org Artificial IntelligenceApr-19-2018

A key feature of neural networks, particularly deep convolutional neural networks, is their ability to "learn" useful representations from data. The very last layer of a neural network is then simply a linear model trained on these "learned" representations. Despite their numerous applications in other tasks such as classification, retrieval, clustering etc., a.k.a. transfer learning, not much work has been published that investigates the structure of these representations or whether structure can be imposed on them during the training process. In this paper, we study the dimensionality of the learned representations by models that have proved highly succesful for image classification. We focus on ResNet-18, ResNet-50 and VGG-19 and observe that when trained on CIFAR10 or CIFAR100 datasets, the learned representations exhibit a fairly low rank structure. We propose a modification to the training procedure, which further encourages low rank representations of activations at various stages in the neural network. Empirically, we show that this has implications for compression and robustness to adversarial examples.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

1804.0709

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

Learning the Nature of Information in Social Networks

Agrawal, Rakesh (Microsoft) | Potamias, Michalis (Groupon) | Terzi, Evimaria (Boston University)

AAAI ConferencesFeb-22-2012

We postulate that the nature of information items plays a vital role in the observed spread of these items in a social network. We capture this intuition by proposing a model that assigns to every information item two parameters: endogeneity and exogeneity. The endogeneity of the item quantifies its tendency to spread primarily through the connections between nodes; the exogeneity quantifies its tendency to be acquired by the nodes, independently of the underlying network. We also extend this item-based model to take into account the openness of each node to new information. We quantify openness by introducing the receptivity of a node. Given a social network and data related to the ordering of adoption of information items by nodes, we develop a maximum-likelihood framework for estimating endogeneity, exogeneity and receptivity parameters. We apply our methodology to synthetic and real data and demonstrate its efficacy as a data-analytic tool.

exogeneity, information item, node, (16 more...)

AAAI Conferences

Sixth International AAAI Conference on Weblogs and Social Media

Country: North America > United States (0.14)

Industry:

Information Technology > Services (0.82)
Government (0.68)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback